Docket No. 02-4020 



WHAT IS CLAIMED IS : 

1 . A method for performing speaker adaptation in a speech recognition system, 
comprising: 

receiving an audio segment; 

determining whether the audio segment is a first audio segment associated with a speaker 

turn; 

decoding the audio segment to generate a transcription associated with the first audio 
segment when the audio segment is the first audio segment; 

estimating a transformation matrix based on the transcription associated with the first 
audio segment; and 

decoding the audio segment using the transformation matrix to generate a transcription 
associated with a subsequent audio segment when the audio segment is not the first audio 
segment. 

2. The method of claim 1, wherein the determining whether the audio segment is a 
first audio segment includes: 

receiving information identifying a start of the speaker turn, and 

identifying the audio segment as the first audio segment based on the information. 

3. The method of claim 1, wherein the determining whether the audio segment is a 
first audio segment includes: 

identifying a start of the speaker turn. 
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4. The method of claim 3, further comprising: 

resetting the transformation matrix upon identifying the start of the speaker turn. 



5. The method of claim 1, further comprising: 

reestimating the transformation matrix based on the transcription associated with the 
subsequent audio segment to obtain a reestimated transformation matrix. 



6. The method of 5, further comprising: 

receiving another audio segment associated with the speaker turn; and 

decoding the other audio segment using the reestimated transformation matrix. 



7. The method of claim 1 , further comprising: 

applying the transformation matrix to one or more acoustic models. 

8. The method of claim 7, wherein the decoding the audio segment using the 
transformation matrix includes: 

using the one or more acoustic models to generate the transcription associated with the 
subsequent audio segment. 

9. The method of claim 1, wherein the estimating a transformation matrix includes: 
constructing a matrix using features associated with straight cepstrals corresponding to 
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the audio segment, and 

replicating the matrix to generate the transformation matrix. 

10. The method of claim 1, wherein the estimating a transformation matrix includes: 
using a statistical alignment technique to obtain values for the transformation matrix. 

1 1 . The method of claim 10, wherein the statistical alignment technique is a Viterbi 
alignment technique. 

v 

12. A system for performing speaker adaptation when performing speech recognition, 
comprising: 

means for receiving an audio segment; 

means for identifying the audio segment as a first audio segment or a subsequent audio 
segment associated with a speaker turn; 

means for decoding the audio segment to generate a transcription associated with the first 
audio segment when the audio segment is the first audio segment; 

means for estimating a transformation matrix based on the transcription associated with 
the first audio segment; and 

means for decoding the audio segment using the transformation matrix to generate a 
transcription associated with the subsequent audio segment when the audio segment is the 
subsequent audio segment. 
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13. A decoder within a speech recognition system, comprising: 

a forward decoding stage; 

a backward decoding stage; and 

a rescoring stage; 

at least one of the forward decoding stage, the backward decoding stage, and the 
rescoring stage being configured to: 

receive an audio segment, 

identify the audio segment as a first audio segment or a subsequent audio segment 
associated with a speaker turn, 

decode the audio segment to generate a transcription associated with the first 
audio segment when the audio segment is the first audio segment, 

estimate a transformation matrix based on the transcription associated with the 
first audio segment, and 

decode the audio segment using the transformation matrix to generate a 
transcription associated with the subsequent audio segment when the audio segment is the 
subsequent audio segment. 



14. The decoder of claim 13, wherein when identifying the audio segment, the at least 
one of the forward decoding stage, the backward decoding stage, and the rescoring stage is 
configured to: 

receive information identifying a start of the speaker turn, and 

identify the audio segment as the first audio segment when the information is received. 
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15. The decoder of claim 13, wherein when identifying the audio segment, the at least 
one of the forward decoding stage, the backward decoding stage, and the rescoring stage is 
configured to: 

identify a start of the speaker turn. 

16. The decoder of claim 15, wherein the at least one of the forward decoding stage, 
the backward decoding stage, and the rescoring stage is further configured to: 

reset the transformation matrix upon identifying the start of the speaker turn. 

17. The decoder of claim 13, wherein the at least one of the forward decoding stage, 
the backward decoding stage, and the rescoring stage is further configured to: 

reestimate the transformation matrix based on the transcription associated with the 
subsequent audio segment to obtain a reestimated transformation matrix. 

18. The decoder of 17, wherein the at least one of the forward decoding stage, the 
backward decoding stage, and the rescoring stage is further configured to: 

receive another audio segment associated with the speaker turn, and 
decode the other audio segment using the reestimated transformation matrix. 

19. The decoder of claim 13, wherein the at least one of the forward decoding stage, 
the backward decoding stage, and the rescoring stage is further configured to: 

apply the transformation matrix to one or more acoustic models. 

18 



Docket No. 02-4020 



20. The decoder of claim 19, wherein when decoding the audio segment using the 
transformation matrix, the at least one of the forward decoding stage, the backward decoding 
stage, and the rescoring stage is configured to: 

use the one or more acoustic models to generate the transcription associated with the 
subsequent audio segment. 

21. The decoder of claim 13, wherein when estimating a transformation matrix, the at 
least one of the forward decoding stage, the backward decoding stage, and the rescoring stage is 
configured to: 

construct a matrix using features associated with straight cepstrals corresponding to the 
audio segment, and 

replicate the matrix to generate the transformation matrix. 

22. The decoder of claim 13, wherein when estimating a transformation matrix, the at 
least one of the forward decoding stage, the backward decoding stage, and the rescoring stage is 
configured to: 

use a statistical alignment technique to obtain values for the transformation matrix. 

23. The decoder of claim 22, wherein the statistical alignment technique is a Viterbi 
alignment technique. 



19 



Docket No. 02-4020 

24. The decoder of claim 13, wherein the backward decoding stage is configured to 
use transcriptions from the forward decoding stage when estimating the transformation matrix. 

25. The decoder of claim 24, wherein the backward decoding stage is configured to 
wait until transcriptions corresponding to the entire speaker turn are received before estimating 
the transformation matrix. 

26. The decoder of claim 13, wherein the rescoring stage is configured to use 
transcriptions from at least one of the forward decoding stage and the backward decoding stage 
when estimating the transformation matrix. 

27. The decoder of claim 26, wherein the rescoring stage is configured to wait until 
transcriptions corresponding to the entire speaker turn are received before estimating the 
transformation matrix. 

28. A speech recognition system, comprising: vv 
speaker change detection logic configured to: 

receive a plurality of audio segments, and 

identify boundaries between speakers associated with the audio segments as 
speaker turns; and 
a decoder configured to: 
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receive, from the speaker change detection logic, one of the audio segments as a 
received audio segment associated with one of the speaker turns, 

identify the received audio segment as a first audio segment or a subsequent audio 
segment associated with the speaker turn, 

decode the received audio segment to generate a transcription associated with the 
first audio segment when the received audio segment is the first audio segment, 

construct a transformation matrix based on the transcription associated with the 
first audio segment, and 

decode the received audio segment using the transformation matrix to generate a 
transcription associated with the subsequent audio segment when the received audio 
segment is the subsequent audio segment. 
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